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I. INTRODUCTION 

One of the most difficult challenges faced in compu¬ 
tational physics is simulating the equilibrium states of 
disordered systems with rough free energy landscapes, 
such as spin glasses mm- Standard Markov-chain Monte 
Carlo methods operating at a single temperature, such as 
the Metropolis algorithm, equilibrate extremely slowly at 
low temperatures due to trapping in metastable states. 
Algorithms that make use of simulations at many tem¬ 
peratures partially solve this problem and are now the 
methods of choice for equilibrium simulations of disor¬ 
dered systems with frustration. Markov-chain Monte 
Carlo methods in this general category include paral¬ 
lel tempering Monte Carlo simulated tempering 

Monte Carlo [8] , and the Wang-Landau algorithm mm- 
Population annealing Monte Carlo mHia, the topic of 
this paper, is an alternative to these multicanonical algo¬ 
rithms. Population annealing also employs many tem¬ 
peratures. However, it is not a Markov-chain Monte 
Carlo method. Instead, population annealing is a se¬ 
quential Monte Carlo method M- Population annealing 
was introduced by Hukushima and Iba m and further 
developed in Refs. [12] and [13] . The algorithm was inde¬ 
pendently discovered and called “sequential Monte Carlo 
simulated annealing” in Ref. m- Sequential Monte 
Carlo algorithms are not commonly used in computa¬ 
tional physics. However, the approach has been applied 
to some problems, e.g., the diffusion Monte Carlo method 
for finding ground states of many-body Schrodinger equa¬ 
tions and Grassberger’s “Go with Winners” method [16] . 

Population annealing has been successfully used in 
large-scale simulations of Ising spin glasses by the present 
authors [nnn]. One of the purposes of this paper is to 
give additional details of the implementation and perfor¬ 
mance of the population annealing algorithm as used in 
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Refs. [17] and [18]. We have also successfully used pop¬ 
ulation annealing as a heuristic to find spin-glass ground 
states m and have shown that population annealing is 
comparably efficient to parallel tempering Monte Carlo 
while both are far more efficient than simulated anneal¬ 
ing [20]. The current state-of-the-art algorithm for sim¬ 
ulations of spin glasses and related systems with rough 
free-energy landscapes is parallel tempering Monte Carlo. 
Another purpose of this paper to argue that population 
annealing is a useful alternative and potentially superior 
to parallel tempering for large-scale studies of the equi¬ 
librium properties of spin glasses and other disordered 
systems. We carry out a detailed comparison of pop¬ 
ulation annealing and parallel tempering for simulating 
spin glasses and we find that the two methods are com¬ 
parably efficient for sampling thermal states, although 
each method has advantages and disadvantages. We also 
develop a theory that quantifies the rate of convergence 
to equilibrium of population annealing and compare the 
theoretical predictions to simulations. 


The outline of the paper is as follows. In Sec. |TT] we 
introduce both population annealing and parallel tem¬ 
pering Monte Carlo, and in Sec. |HI| we discuss several 
features of population annealing. Sectionjiyjis concerned 
with the systematic and statistical errors in population 
annealing and the section concludes with a comparison 
of errors to those in Markov chain Monte Carlo meth¬ 
ods such as parallel tempering. Section [V] introduces 
the Edwards-Anderson Ising spin glass model and gives 
details of the simulations and the quantities that were 
measured. Section jV!] presents the results of large-scale 
simulations using both population annealing and parallel 
tempering with an emphasis on elucidating the proper¬ 
ties of population annealing and comparing them to par¬ 
allel tempering. The paper concludes with a discussion 
in Sec. IVHI 





2 


II. POPULATION ANNEALING AND 
PARALLEL TEMPERING MONTE CARLO 


A. Population annealing 


Population annealing (PA) is closely related to simu¬ 
lated annealing [20] (SA) except that it uses a popula¬ 
tion of replicas and this population is resampled at each 
temperature step. Like simulated annealing, PA involves 
lowering the temperature of the system through a se¬ 
quence of temperatures from a high temperature where 
equilibration (also known as thermalization) is easy to 
a low, target temperature Tq where equilibration is dif¬ 
ficult. Unlike SA, PA is designed to simulate the equi¬ 
librium Gibbs distribution at each temperature that is 
traversed. The resampling step ensures that the popula¬ 
tion stays close to the equilibrium ensemble. Just as in 
simulated annealing, at each temperature, each replica is 
acted on by a Markov-chain Monte Carlo (MCMC) pro¬ 
cedure such as the Metropolis algorithm. The annealing 
schedule consists of a sequence {/Satj,-!, ...,/3o} of Nt 
inverse temperatures, j3 = 1/T labeled in descending or¬ 
der in /3 so that /Jj+i < Pj. In our studies, the inverse 
temperatures are equally spaced, starting at infinite tem¬ 
perature, Pnt-1 = 0 and ending at /?o = 5. The MCMC 
method is the Metropolis algorithm [2TJ [22] , which is ap¬ 
plied for Ns sweeps at each temperature. The initial 
population size is R and each replica is independently 
initialized with random spins corresponding to an infi¬ 
nite temperature ensemble. In our implementation the 
population size fluctuates. In a given run at inverse tem¬ 
perature P the population size is Rp with a mean value 
of i?. 

The resampling step uses differential reproduction of 
replicas with a number of copies depending on the 
replica’s energy. Let Ej be the energy of replica j. Given 
an equilibrium population at /3, the goal is to resample 
the population so that it is an equilibrium population at 
P' with P' > p. For a replica j, with energy Ej the ratio 
of the statistical weights at P and P' is exp [—{$' — P)Ej]. 
Note that this is the reweighting used in the histogram 
re-weighting method [23] . 

Since the typical energy is large and negative, the 
reweighting factor is much greater than unity so that a 
normalization is needed to keep the population size close 
to R. The normalized weights Tj{P,P') are given by, 


T-jiP^P') 


Q{P,P') 


( 1 ) 


where Q{P,P') is the normalization. 


= ( 2 ) 

Note that the sum of Tj{P, P') over j is R. 

The new population at temperature P' is obtained by 
resampling the old population such that the number of 


copies rij of replica j is a random non-negative integer 
whose mean is Tj(P,P'). There are many ways to choose 
rij to satisfy this requirement (23]. Some of these meth¬ 
ods such as a multinomially distributed rij keep the pop¬ 
ulation size fixed while others allow the population size to 
fluctuate. We choose a method that minimizes the vari¬ 
ance of rij and has '/R fluctuations in the population size. 
Let rij be [r^J with probability 1— ( tj — [tj J) or [ tj ] with 
probability (r, — [r,J) where [r,] is the floor (greatest 
integer less than) Tj and |"rj] is the ceiling (least integer 
greater than) Tj. By minimizing the variance of rij we 
reduce correlations in the population while the fluctuat¬ 
ing population size creates a small overhead in memory 
usage. 

Consider a single temperature step in PA. If the orig¬ 
inal, higher temperature population is an equilibrium 
ensemble representing the Gibbs distribution at inverse 
temperature P then the final, lower temperature popula¬ 
tion is also an equilibrium ensemble at inverse tempera¬ 
ture P' . However, the new population is correlated due 
to copying replicas in the resampling step and, for finite 
i?, the new population represents a biased ensemble due 
to a lack of representation of the low energy tail of the P' 
Gibbs distribution. These errors are partially corrected 
by the MCMC sweeps P' . Statistical and systematic er¬ 
rors are discussed in Sec. llVl 

The full PA algorithm is a sequence of Nt — 1 an¬ 
nealing steps starting from infinite temperature. In each 
annealing step the population is resampled and then Ns 
sweeps of the Metropolis algorithm are carried out on 
each replica. 


B. Parallel Tempering 


In this section, we briefly describe parallel tempering 
(PT) Monte Carlo, the state-of-the-art Monte Carlo algo¬ 
rithm for spin glasses and many other frustrated systems. 
Parallel tempering is a Markov-chain Monte Carlo algo¬ 
rithm while population annealing is a sequential Monte 
Carlo algorithm, nonetheless, the two algorithms share 
many similarities. In parallel tempering, a set of Nt tem¬ 
peratures is used ranging from a high temperature that 
is easy to equilibrate to a low temperature of interest, Tq- 
There is a single replica of the system at each of these 
temperatures and each of these replicas is operated on 
by a MCMC method such as the Metropolis algorithm 
at that temperature. After Ns sweeps of the replicas 
at their respective temperatures, replica exchange moves 
are proposed. In a replica exchange move, replicas at 
two temperatures are proposed for swapping. Typically, 
the two temperatures are chosen to be neighboring tem¬ 
peratures in the list of temperatures. Let these two in¬ 
verse temperatures be P and P' with P' > p and let E 
and E' be the respective energies of the replicas at these 
two temperatures. The swap is accepted with probabil¬ 
ity min [1, exp[(/3' — P)(E' — A)]]. It is easily shown that 
this swap probability satisfies detailed balance with 
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respect to the product measure of Gibbs distributions at 
the Nt temperatures so that PT converges to equilib¬ 
rium at each temperature. Diffusion of replicas from low 
temperature to high temperature and back, called ‘round 
trips’ allows PT to surmount barriers in the free energy 

landscape mm- 


III. FEATURES OF POPULATION 
ANNEALING 

A. Free energy estimator 


B. Weighted averages 

Many independent runs of PA for the same system may 
be combined to reduce both systematic and statistical 
errors in the measurement of an observable A. Suppose 
we have carried out M independent runs and obtained 
estimates Am, m Let Fm{fi) be the free 

energy estimated in run m at the measurement temper¬ 
ature 1//3. If the different runs have different population 
sizes, let Rm be the nominal population size in run m. 
Then, the best estimator, A for the thermal average of 
the observable is. 


The free-energy difference between the highest and 
lowest temperature is easily measured using population 
annealing |12] . If the highest temperature is infinity, as 
is the case in our implementation of the algorithm, then 
the absolute free energy can be measured. The key idea 
is that the normalization factor, defined in Eq. ([^, is the 
ratio of the partition functions at the two inverse temper¬ 
atures P io fi'. To see this, expand the definition of Z{(3') 
and replace the resulting average at (3 by its population 
estimate, 

zm 

Z{P) 


The summation over 7 in the first two lines in the above 
expressions is over all possible spin configurations while 
the summation in the last line is over the population. 
Since F = —TlnZ, the estimator of the free energy F at 
each simulated temperature is, 




-B’ E~, 


E 


zm 


-BE-, 


^ Z{(3) ^ 






)/3 


Rb _,=i 




N'j' — 1 

- l3kF{/3k) = ^ lnQ{j3e,j3e-i)+\nn, (4) 

e=k-^-i 


where Q is the number of microstates of the systems and 
{Pi} is the sequence of inverse temperatures in descend¬ 
ing order: Pnt-i = 0 Po = l/’Tb is the inverse of the 
target temperature. For Ising systems, = 2^ where N 
is the number of spins. Since the number of temperatures 
in population annealing is typically in the hundreds, it 
is straightforward to accurately measure free energy, en¬ 
ergy and entropy as a continuous function of temperature 
over the whole range of temperatures from infinity to Tg. 
It should be noted that the same method can also be ef¬ 
ficiently employed to measure free-energy differences in 
PT Hg. 


T,m=l-^mRmexp[-PFmiP)] 


EEi Rrn exp[-PFmiP)] 


( 5 ) 


To justify this formula, consider an unnormalized variant 
of population annealing in which the population is not 
kept under control but is allowed to grow exponentially. 
In the resampling step in the unnormalized version of 
PA, the expected number of copies of replica j is simply 
the reweighting factor exp [—{P' — P)Ej\. Unnormalized 
PA is equivalent to standard PA except that it requires 
exponential computer resources and yields better statis¬ 
tics. Without the normalization factor in the resampling 
step, each replica evolves independently and combining 
separate runs of the unnormalized algorithm requires no 
weighting factor other than the obvious weighting by the 
population size, Rm- Thermal averages in unnormalized 
PA are obtained using simple averaging. The simple av¬ 
erage in unnormalized PA becomes a weighted average 
in standard PA because the populations in different runs 
of standard PA have been normalized differently. Specifi¬ 
cally, the product of the normalization factors Q [Eq. <§] 
from the highest temperature to the measurement tem¬ 
perature is the ratio of the population size in unnormal¬ 
ized PA to the population size in standard PA. But this 
product is proportional to the exponential of the free en¬ 
ergy [see Eq. Q] justifying the use of i?,„ exp[—/3iE,{/3)] 
as the weighting factor in standard PA. Note that ob¬ 
servables such as the spin and link overlap that involve 
more than one independent copy of the system may also 
be estimated using weighted averages from multiple in¬ 


dependent runs as discussed in Sec. VD 


Weighted averaging for the dimensionless free energy 
is more complicated because the free energy involves 
measurements at all temperatures, however, as shown in 
Ref. [12], the final result is relatively simple: 


— PF = In 


EEi exp[-PFm] 

Z^m=l 


( 6 ) 


This equation is obtained from Eq. Q and the fact that 
Q{Pi, Pi-i) is an observable for which weighted averaging 
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applies, but at inverse temperature fig. Thus 
- PkF{Pk) = 

Yl,m=i QnijPt, l3e-i)Rjn exp[- PeFmjPi)] 
Em=i exp[-^fF,„(A)] 

+ In n. 

( 7 ) 

This complicated equation for the weighted average of 
the dimensionless free energy collapses to Eq. after 
using the fact that 

QmWe, Pe-i) exp[-PiF^{l3i)] = exp[-/3f_iF^(/3^_i)], 

( 8 ) 

and also noting that the weighting factor at /3 = 0 is 
simply Rm and setting j5k = /3- 

It is important to understand that combining multi¬ 
ple independent runs with weighted averaging reduces 
both statistical errors and systematic errors. By con¬ 
trast, ordinary averaging reduces only statistical errors. 
It is obvious that more measurements should reduce sta¬ 
tistical errors. Systematic errors are reduced because the 
weighted average of multiple runs is identical to simulat¬ 
ing a larger population size and systematic errors dimin¬ 
ish with population size. Indeed, all ensemble averaged 
quantities are exact in the limit of an inhnite population 
size or, equivalently, using weighted averaging in the limit 
of an infinite number of runs with fixed population size. 
If the variance in PF{/3) is much less than unity, there is 
little difference between weighted averaging and simple 
averaging. However, if the variance of the free energy is 
large, the weighting factors, which depend exponentially 
on the free energy, are broadly distributed, and the two 
averages differ substantially. As we shall see in the next 
subsection, the variance of the free-energy estimator is a 
fundamental quantity in understanding systematic errors 
in PA. 

Note that there is no method available for combin¬ 
ing independent runs of a MCMC algorithm to decrease 
systematic errors. The most comparable procedure to 
weighted averaging for MCMC algorithms is “checkpoint¬ 
ing.” In checkpointing, the complete state of the system 
is saved at the end of the simulation. If results with 
smaller systematic errors are required, the simulation 
can be restarted beginning with the final state of the 
previous simulation so that averaging is initiated after a 
longer initialization period. Compared to weighted av¬ 
eraging, checkpointing requires substantially more stor¬ 
age because the full configuration of the system must be 
stored, instead of just the estimators for the observables 
and the free energy. In addition, checkpointing must be 
done sequentially while weighted averaging can be car¬ 
ried out using multiple parallel runs. It is a significant 
advantage of PA that independent runs can be combined 
to improve systematic errors (equilibration), which is not 
possible in PT. 


C. Macroscopic degrees of freedom 

Some problems in computational statistical mechan¬ 
ics require averaging over a small discrete set of macro¬ 
scopic degrees of freedom in addition to a much larger 
number of microscopic degrees of freedom. An exam¬ 
ple of this situation is thermal boundary conditions for 
spin models m- For thermal boundary conditions in d 
space dimensions, the 2'^ combinations of periodic and 
anti-periodic boundary conditions in the d directions are 
all included in the thermal ensemble. Each combina¬ 
tion of spin configuration and boundary condition ap¬ 
pears in the ensemble with its Boltzmann weight. Since 
differing boundary conditions will have energies that dif¬ 
fer by the surface area of the system, energy differences 
are much greater than for single spin flips. Large energy 
differences between different boundary conditions imply 
that Metropolis moves to change boundary conditions 
will be strongly suppressed except at very high tempera¬ 
ture. Macroscopic degrees of freedom such as the bound¬ 
ary conditions in thermal boundary conditions can be 
easily handled by PA if the starting temperature in the 
simulation is ,0 = 0. At infinite temperature, each macro¬ 
scopic state appears with the same probability so the 
initial population is set up with equal fractions in each 
macroscopic state. For example, in three-dimensional 
{d = 3) Ising spin-glass simulations with thermal bound¬ 
ary conditions, 1/8 of the population is initialized in each 
of the 2^ = 8 boundary conditions. No Monte Carlo 
moves are required to change boundary conditions dur¬ 
ing the remainder of the simulation because the resam¬ 
pling step correctly takes care of adjusting the fraction of 
each boundary condition in the population. We have suc¬ 
cessfully used this method to carry out large-scale sim¬ 
ulations of the Edwards-Anderson spin glass in thermal 
boundary conditions nanHi. A more accurate but also 
more costly method is to simulate each macroscopic state 
separately and then combine them with weights given by 
the exponentials of their respective free energies. Macro¬ 
scopic degrees of freedom can also be efficiently simulated 
using PT [27]. 

IV. SYSTEMATIC AND STATISTICAL ERRORS 

A. Systematic errors and the variance of the free 
energy 

Systematic errors in PA reflect the fact that for hnite 
population size R, the population is not an unbiased sam¬ 
ple from the Gibbs distribution. For PA, the algorithm 
‘equilibrates’ to the Gibbs distribution as R increases. In 
this section we study the convergence in R to the equilib¬ 
rium Gibbs distribution. Consider the weighted average 
of M runs each with fixed population size R. In what fol¬ 
lows a fixed value of R is implicit in the notation. We ar¬ 
gued in Sec. [Ill B] that the exact Gibbs ensemble average 
(A) of observable A is obtained by weighted averaging in 


Nt-1 


In 


£=k+l 
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the limit of infinitely many runs, 


(.4) = ita _ ,g) 


Em=iexp[-/3F„(/3)] 

Replacing the sum over runs by an integral over classes 
of runs with similar values of A and F, we obtain, 


{A) = 


J f X pAF{x,y)exp[-f3y]dxdy 
J PF{y)exp[-f]y]dy 


( 10 ) 


where pf{') is the probability density for free energy es¬ 
timator F and paf{'-: ’) is the joint probability density 
of measuring observable A and free energy estimator F. 
The average of the estimator ^ in a single run of PA, (A) 
is 


(A) 


J X pj^(x)dx. 


( 11 ) 


large number of nearly independent members of the pop¬ 
ulation. Thus, for large R we expect the simpler expres¬ 
sion. 


Af = f 4 


(16) 


to become exact. 

Similarly, for large R we expect the joint distribution 
PAF in Eq. (101 to be a bi-variate Gaussian defined by the 


means and variances of A and E, and their covariance, 
cov(A, E). Carrying out the Gaussian integrals for (A) 
in Eq. (10) we obtain for the equilibrium value of the 


observable. 


{A) = PA- Pcov{A,F). (17) 


Thus, the systematic error AA = (A) — (A) in estimating 
the observable A in a simulations with population size R 
is given by 


Note that the difference between the integrals for (A) 
and (A) is simply the weighting factor exp[—,5E]. The 
difference, /S.A = {A) — {A) is the systematic error in 
measuring A in a single run of PA with population size 
R. 

Systematic errors for the free energy present a simpler 
situation. The cumulant generating function (j) of pf is 
defined as. 


AA = fd cov{A, F). (18) 

We see that for large R, the systematic error in any ob¬ 
servable is proportional to the covariance of the observ¬ 
able with the free-energy estimator. This expression for 
the systematic error in A can be rewritten in a form that 
emphasizes the central role of the variance of the free 
energy, 



r f 1 

, ^ AA = var(/3F) 

cov(A, I3F) 

0 

II 

/ dyexp[zy]pF{y) 

(12) 

var(/3F) 


(19) 


But </)(—/3) is the integral expression for the weighted 
average of the dimensionless free energy, see Eq. § with 
constant Rm- Thus, the equilibrium free energy F is 
related to the distribution of the free energy estimator 
via, 

F = (13) 


It is expected that the quantity in the square brackets 
will be nearly independent of R so that systematic errors 
in A are proportional to the variance of dimensionless 
free energy, just as is the case for the free energy itself. 

A central limit theorem argument suggests that 
var{l3F) decreases as 1/i? so that the product Rvai{/3F) 
should approach a constant. Define the equilibration pop¬ 
ulation size, Pf as 


while the expected value of the free energy estimator from 
a single run, (F), is given by 


Pf = lim R var(/3F). (20) 

R^oo 


= M-P (14) 

where pf the mean oi pf- The systematic error in the 
free energy is the difference between these expressions, 
AF = (F) — F. Since (j){z) is the cumulant generating 
function, we see that 


(F) = 


n=3 


(15) 


where (7„ is the n*^ cumulant of pf and (t|. = C 2 is the 
variance of pf • 

Eor large population size (i? 3> 1) an argument based 
on the central limit theorem suggests that pf should be¬ 
come a Gaussian since F is the sum of contributions from 


The population is in equilibrium when R is much larger 
than Pf. Define SA/fiSF as the limit of the quantity in 
the square brackets in Eq. (19): 


SA cov(A,(3F) 

-^ = hm 

PSF R^oo Mar{j5F) 


( 21 ) 


Given these definitions, the asymptotic theoretical pre¬ 
diction for systematic errors is that, 


AA^%-^, ( 22 ) 

R I36F 

for any observable A, except the free energy. Eor the free 
energy, the simpler expression holds: 


AF = 


Pf 

213 R' 


(23) 
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Note that 5A/5F can be interpreted as the slope of 
the regression line through the joint distribution 
To see this, let {x \ y) be the conditional average of x 
given y. For a general bivariate normal distribution, the 
conditional average is given by 

{x\y) = Px+ {y - y,y)^ (24) 


Next, make the additional approximation, which leads 
to an even weaker upper bound, that the variance of the 
value of the observable in each family is var(y4), the full 
variance of the observable in the thermal ensemble. In 
particular, we are ignoring the possibility that the ob¬ 
servable is correlated with the family size. For a given 
distribution of family sizes, we obtain the variance of the 
estimator of the observable var(^): 


from which one sees that cov(^, F)/var(F) is the slope 
of the linear dependence of A on F. One should not, 
however, consider Eq. (p3| to be a special case of Eq. (22) 


by setting 5F / 5F = 1 since the free-energy error equation 
has an extra factor of 1/2. 

Eor weighted averages, we expect similar results for 
systematic errors but with R replaced by MRq, where 
Rq is the size of the individual runs and M the num¬ 
ber of runs in the weighted average. The substitutions 
R — >■ MRq in Eqs. (221 and (231 should become exact for 
weighted averages as Ro/pf —>■ oo but for finite Rq/ pf, 
where the joint distribution is not close to a bivariate 
Gaussian, the dependence on M may be more compli¬ 
cated. 


B. Statistical errors 

The statistical error 5A of an observable A is the 
square root of the variance of the estimator 

6A = [var(i)]i/2. (25) 

Statistical errors scale inversely in the square root of the 
number of independent observations. In the absence of 
resampling, the number of independent measurements in 
PA is the population size R. However, the resampling 
step makes identical copies of replicas and thus correlates 
the population so that the effective number of indepen¬ 
dent measurements is less than R. On the other hand, 
MCMC sweeps at each temperature decorrelate the repli¬ 
cas. Thus, if we consider only the correlating effect of 
resampling we obtain an upper bound on the statistical 
errors. 

Eamily trees can be constructed for each member of the 
initial population. Call all the descendants of replica i in 
the initial population a family and let be the fraction 
of the population in family i. In a typical PA simula¬ 
tion starting at infinite temperature and ending at a low 
temperature the great majority of initial replicas have 
no descendants, = 0. To obtain an upper bound that 
ignores the decorrelating effect of the MCMC sweeps, as¬ 
sume that observable A takes a single value Ai for every 
member of family i. If the MCMC algorithm applied at 
each temperature step were completely ineffectual, this 
would be the case. Given this assumption, the estimator 
A for the full simulation is 

A = (26) 

i 


var(A) < var(A) (27) 

I 

Note that if every family contained one member and 
there were R families then = 1/i? from which we would 
obtain the result for R independent measurements, that 
SA = [var(A)/i?]^/^. More generally, the statistical er¬ 
rors are bounded by the second moment of the family 
size distribution. Suppose this moment scales as 1/i? 
and define the mean square family size, pp. 

pt= Yim R'^nj. (28) 

R-^oo 


In terms of pt, the bound on the statistical error in 5A is 

5A < yjvai{A)Pt/R- (29) 

The quantity R/pt is an effective number of independent 
measurements. 

A second measure of the effective number of families is 
related to the entropy S'/ of the family size distribution 

Sf =-^rXilnni. (30) 

i 

The exponential e^f is an effective number of families. 
Suppose R/e^f has a limit and define the entropic family 
size, ps, 

Ps = lim R/e^F (31) 

R —¥00 

The quantity R/Ps is an alternative measure of the num¬ 
ber of independent measurements. If every family is a 
singleton then ps = pt = Y. If the family size distribu¬ 
tion is exponential with mean p, then it is straightforward 
to show that pt = 2p and Ps « 1.53/r. As we shall see 
in Sec. |VIB| these two measures are always close to one 
another. All of the characteristic sizes, Pf,Ps and pt are 
defined as limits as R goes to infinity but, in practice, we 
measure them at a fixed large R. 


C. Comparison of errors in population annealing 
and Markov-chain Monte Carlo algorithms 

In the previous two subsections we have seen that sys¬ 
tematic and statistical errors in PA both decrease with 
population size i?; systematic errors diminish as 1/i?, 
while statistical errors diminish as \/\/R. PA is a se¬ 
quential Monte Carlo method while the great majority 
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of simulation methods in statistical physics are MCMC 
methods. For MCMC methods observables are measured 
using time averages rather than ensemble averages as is 
the case for PA and the equivalent quantity to popula¬ 
tion size is the length of the run, T. Errors are reduced 
by increasing the running time and are estimated from 
the autocorrelation functions of observables. Systematic 
errors in MCMC diminish as exp(—T/Texp) where Texp is 
the “exponential autocorrelation time,” while sta tistical 
errors in an observable A diminish as \/2r^j/7* where 
is the “integrated autocorrelation time” for A, see, 
for example, Ref. [55] for a discussion of integrated and 
exponential autocorrelation times. 

In a loose sense we can equate the equilibration popu¬ 
lation size pf with the exponential autocorrelation time 
Texp and either of the family size measures, ps or pt, with 
integrated autocorrelation times. Naively, it would ap¬ 
pear that even if the measures Texp and pf were compa¬ 
rable, a MCMC method would have a considerable ad¬ 
vantage over PA because MCMC algorithms converge ex¬ 
ponentially in the amount of computational work rather 
than inversely. On further reflection, one can see that 
the exponential advantage of MCMC is mostly illusory 
because of statistical errors, which decrease only as the 
inverse square root of the amount of computational work 
for both MCMC and PA. For both types of algorithms, 
the systematic errors are dwarfed by the statistical errors 
for simulations of a single system. 

However, for disordered systems, it is usually necessary 
to carry out an additional average over many realizations 
of the disorder. Statistical errors for disorder-averaged 
quantities decrease with the number of disorder realiza¬ 
tions, n as Ijy/n. When n is large enough, there could be 
a regime where statistical errors in disorder averages are 
smaller than the systematic errors of PA. To investigate 
this issue more quantitatively, consider an observable A 
and its disorder average [(A)Lwhere [.. indicates a 
disorder average. Using Eq. (22) we have the following 
expression for the systematic error in the disorder aver¬ 
age, A [(A)]^: 


A[(^)]d 


Pf 5A 

'rJ^f 


(32) 


Let S [(A)]^ be the statistical error in [(A)]^ and suppose 
that the main contribution to this statistical error comes 
from the variance with respect to disorder in (A) defined 
by 

(33) 

Systematic errors are negligible relative to statistical er¬ 
rors if A [(A)]^ <C yjn where n is the number of dis¬ 
order realizations in the sample. Thus, systematic errors 
are negligible if 


i/n Pf 6A 
5A 


(34) 


and for the free energy we have the simpler expression. 


yn 


Pf 


2I3R 


< 1 , 


(35) 


- d 


In our L = 10 simulations of the Edwards-Anderson 
model, discussed in the following, n = 5000 and Yp = 
23.9 at /3 = 5. Our equilibration criterion requires that 
Ps/R < 10“^ and, for most instances, Ps/R ^ 10“^. As 


we shall see in Sec. VIB Pf is typically less than a factor 
of 2 larger than ps. Thus, the left-hand side of Eq. ( [^ 
for the disorder average of the free energy is less than 
10 “^ and we are safely in the regime where statistical 
errors greatly exceed systematic errors. 


V. MODEL, SIMULATION DETAILS, AND 
OBSERVABLES 


A. Edwards-Anderson model 

We test the performance of PA and compare it to PT 
in the context of the three-dimensional (3D) Edwards- 
Anderson (EA) Ising spin glass model [53], defined by 
the Hamiltonian 


n = (36) 

(id) 


where Si S {±1} are Ising spins and the sum is over 
nearest neighbors on a cubic lattice of linear size L with 
periodic boundary conditions. The random couplings Jij 
are chosen from a Gaussian distribution with zero mean 
and unit variance. A set of couplings J = {Jij} defines 
a disorder realization or “instance.” 

Sampling low-temperature equilibrium states of the 3D 
EA model is computationally very difficult. It is known 
that finding ground states of the 3D EA models is an 
NP-hard computational problem |5Dj and it is believed 
that sampling low-temperature equilibrium states is also 
exponentially hard in the sense that the amount of com¬ 
putational work needed to achieve a fixed accuracy in 
sampling grows exponentially in the system size L. For 
MCMC algorithms, this intuition can be made more pre¬ 
cise as a statement about the L dependence of autocorre¬ 
lation times, while for PA it is a statement about quan¬ 
tities such as Pf, pt and ps, introduced in Sec. 
characterize population sizes required for equi 


IV which 
ibration. 


There are large sample-to-sample variations in the 
difficulty of sampling equilibrium states of the 3D EA 
model. It is known that the distribution of integrated 
autocorrelation times and other equilibration measures 
for PT is approximately log-normal [3T1 - I33] . One of the 
important question studied in Sec. |VI| is whether PA and 
PT both find the same spin-glass instances to be either 
hard or easy. 

There are two reasons why the 3D EA model is com¬ 
putationally difficult that can be understood intuitively 
in terms of the free-energy landscape. The first reason 















is that the tree-energy landscape is rough for typical in¬ 
stances with several relevant local minima separated by 
high barriers. Both PT and PA are designed to par¬ 
tially overcome this source of computational hardness 
though it certainly plays a role [SS]. The second rea¬ 
son is related to temperature chaos [HI [SH [35] , which 
is effectively a change in dominance between minima in 
the free energy landscape as a function of temperature. 
At high temperatures free-energy minima with large en¬ 
tropies dominate while at lower temperatures free-energy 
minima with low energies dominate and finding these rare 
low-energy states is difficult for both PA and PT. 

Extensive numerical evidence supports the idea that 
the 3D EA model undergoes a second-order phase tran¬ 
sition from a paramagnetic high-temperature phase to 
a spin-glass phase at a temperature ~ 0.96 (see, for 
example, Ref. ED). Ordering in the spin-glass phase is 
detected using the overlap distribution. The overlap q is 
defined as 

i 

where N = is the number of spins, and the super¬ 
scripts “( 1 )” and “( 2 )” refer to two statistically inde¬ 
pendent spin configurations chosen from the Gibbs dis¬ 
tribution with the same disorder J . Let Pj{q) be the 
overlap distribution for instance . In the paramagnetic 
phase and for large systems, Pj{q) is concentrated near 
g = 0 , showing that independent spin configurations cho¬ 
sen from the ensemble have little correlation. The behav¬ 
ior of Pj{q) for large L in the spin-glass phase is the sub¬ 
ject of a longstanding controversy but it is agreed that 
there are two peaks at ±gEA with ^ea > 0 and ^ea —> 1 
as T —>■ 0. The controversy concerns whether or not 
Pj(q) simply consists of two delta functions at igsA as 
predicted by the droplet picture or whether there 

is a forest of smaller ^-functions between —gsA and -I-^ea 
as predicted by the replica symmetry breaking (RSB) pic¬ 
ture |39l ED] ■ The droplet picture asserts that the spin- 
glass phase consists of two thermodynamic pure states 
related by a global spin flip while the RSB picture asserts 
that there exists a countable infinity of thermodynamic 
pure states. For finite systems, Pj{q) varies greatly from 
instance to instance with some disorder realizations re¬ 
sembling the predictions of the droplet picture and others 
the RSB picture. The weight of Pj{q) near g = 0 has 
been used to distinguish the two competing theories of 
the low temperature phase of the EA model. 

B. Simulation details 

The large data sets used in this study were obtained 
in previous studies of the low-temperature phase of spin 
glasses [171 El] , the dynamics of PT [S^ , and a compar¬ 
ison of PA and PT for finding ground states [19] . These 
data sets involve roughly n « 5000 disorder realizations 
for each of five system sizes, A = 4, 6 , 8 , 10, and 12 


(note that data for A = 12 using PT have also been 
simulated, too, albeit at a higher temperatures). The 
same set of disorder realizations were simulated using 
both PA and PT to allow for a detailed comparison be¬ 
tween the two algorithms. In addition, we carried out 
PA simulations for n = 1000 instances with A = 14. 
The parameters of the PA simulations are given in Ta¬ 
ble |Tj In our implementation of PA, the annealing sched¬ 
ule has temperatures that are evenly spaced in /3 = 1/T 
starting from infinite temperature. In all PA simulations 
we used Ns = 10 Metropolis sweeps per temperature. 
The number of Metropolis sweeps per simulation, W is 
given by W = RNsNt so that W is a rough measure of 
the computational work expended per spin in the simu¬ 
lation. For the A = 14 runs we used weighted averaging 
with M = 10 independent runs per instance so that here 
W = MRNsNt- 

The equilibration criterion that we use is that R > 
lOOps, which is equivalent to 5/ > In(lOO). It is worth 
mentioning here that ps converges rapidly as the popu¬ 
lation size grows (see Fig. 10). In our simulations, we 
first choose a population size for which most instances are 
equilibrated. A larger population is used for hard sam¬ 
ples and the process is iterated until all samples meet 
the equilibration criterion or until it becomes impracti¬ 
cal to increase R. In the latter case, we either use more 
temperatures or perform a weighted average. For M in¬ 
dependent runs, it is straightforward to show that the 
entropy of the family size distribution is given by 

M M 

Sf = ^ Sf^^Pi -'^Pz ln(pi)> (38) 

i=l i=l 

where Sf^i is the entropy of the family size distribution 
for run i and pi is the weight factor for run i, defined in 
Eq. 5. Note that if the Pi and Sf^i are both constants in¬ 
dependent of i, then from Eq. 31, ps is the same whether 
it is estimated from a single long run with population 
MR or M shorter runs, each of length R. 

The population size for each system size is listed in 
the column labeled R in Table |TJ This population size 
satisfies the equilibration criterion for most disorder re¬ 
alizations. However, for the hardest instances, runs were 
required with larger population sizes. The number of 
hard instances, nhard is listed in the last column of the 
aforementioned table. The PA simulations were carried 
out using OpenMP implemented on eight cores where each 
core works on a different subset of the population. In 
addition to the simulations described in Table [J we car¬ 
ried out a detailed study of a single A = 8 and a single 
A = 4 disorder realization in which we performed a large 
number of independent runs for various population sizes 
to check predictions concerning systematic errors. 

The parameters of the PT simulations are given in Ta¬ 
ble |llj In the implementation of PT, the highest temper¬ 
ature is A = 2 and each PT sweep involves Ns = 1 heat 
bath sweeps per replica. Each simulation involves 2^+^ 
PT sweeps, 2^ for equilibration and 2^ for data collection. 
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TABLE I: Parameters of the main population annealing sim¬ 
ulations m- L is the system size, R is the standard number 
of replicas, To is the lowest temperature simulated, Nt the 
number of temperatures (evenly spaced in /?) in the annealing 
schedule, and W = RNtNs is the number of sweeps applied 
to a single disorder realization, n is the number of disorder re¬ 
alizations and rihard is the number of hard instances requiring 
more than R replicas to meet the equilibration requirement. 
For L = 14 we used weighted averaging with M = 10 inde¬ 
pendent runs so W = MRNtNs for this case. 


L 

R 

To 

Nt 

W 

n 

^hard 

4 

5 X 10"* 

0.20 

101 

5 X 10^ 

4941 

0 

6 

2 X 10® 

0.20 

101 

2 X 10® 

4959 

0 

8 

5 X 10® 

0.20 

201 

10® 

5099 

5 

10 

10® 

0.20 

301 

3 X 10® 

4945 

286 

12 

10® 

0.333 

301 

3 X 10® 

5000 

533 

14 

3 X 10® 

0.333 

401 

1.2 X 10“ 

1000 

N/A 


TABLE IL Parameters of the parallel tempering simula¬ 
tions [31I1I]. L is the linear system size, 2^ is the standard 
number of Monte Carlo sweeps. To is the lowest temperature 
used, Nt is the number of temperatures, and W = 2^'^^NtNs 
is the number of sweeps applied to a single disorder realiza¬ 
tion. n is the number of disorder realizations. 


L 

b 

To 

Nt 

W 

n 

4 

18 

0.20 

16 

8 X 10® 

4941 

6 

24 

0.20 

16 

5 X 10* 

4959 

8 

27 

0.20 

16 

4 X 10® 

5099 

10 

27 

0.20 

16 

4 X 10® 

4945 


The number of heat bath sweeps per simulation and thus 
the computational work per spin is W = 2^~^^NsNt- In 
fact, for computing the overlap q, twice this number of 
sweeps were used because two independent simulations 
are needed to compute q in PT. Additional details of the 
PT simulations can be found in Ref. [55] . 

C. Measured quantities 

We measured standard spin-glass observables and also 
quantities intrinsic to the PA algorithm. We measured 
the internal energy Ej, free energy Fj, and spin over¬ 
lap distribution Pj{q) for all disorder realizations. From 
Pj(q) we obtained its integral near the origin, 

ij = n Pj{q)- (39) 

J-0.2 

From ij for the n instances we obtain the disorder av¬ 
erage I = {Ij\d- Unless required to prevent confusion, 
we henceforth drop the subscript indicating a partic¬ 
ular instance. Observables are obtained from population 
averages in contrast to the situation for PT and other 
MCMC methods where observables are obtained from 


time averages. Estimators of observables obtained from 
population averages for a single instance are indicated by 
a tilde. 

We estimated the family-based characteristic sizes, pt 
and Pa for each disorder realization. For the L = 14 and 
for the two individual size L = 4 and L = 8 instances 
we also measured the equilibration population size p/, 
which requires multiple runs. These quantities are de¬ 
fined as limits in R but are estimated from the finite R 
simulations. Comparison data for PT were obtained in 
previous studies [51111]. For the same set of disorder 
realizations, we have values of Ij and the integrated au¬ 
tocorrelation time for the spin overlap, Tj 

D. Spin overlap measurement 

The spin overlap is an important quantity in spin-glass 
studies and its integral near the origin, Ij, has been 
extensively studied as a way of distinguishing compet¬ 
ing pictures of the low-temperature phase of spin glasses. 
The measurement of the spin overlap distribution Pj{q) 
would appear to be computationally twice as difficult as 
other observables because it requires two independent 
spin configurations. Indeed, in standard implementations 
of PT, two separate simulations are run simultaneously 
and spin configurations from each are combined to obtain 
values of q, so the work required to measure P{q) (and 
also the link overlap distribution [12]) is twice that for 
observables obtained from a single spin configuration. In 
PA, however, it is possible to construct P{q) from a single 
run by taking advantage of the fact that replicas from dif¬ 
ferent families, i.e., descended from different initial repli¬ 
cas, are independent. We use the following method to 
estimate P{q) at a given temperature /3. 

First, a random permutation of the population, 
(tti , 7 r2 ,..., ) is constructed and used to make an ini¬ 

tial pairing of replicas in the population. A random per¬ 
mutation is likely to include pairs chosen from the same 
family. If replica k and replica are in the same family, 
they are potentially correlated. This “incest” problem 
is corrected sequentially by performing transpositions as 
needed. Suppose k is the least integer such that replicas 
k and tt^ are in the same family. Then the successive 
replicas 7rf.+2 • ■ ■ are tested until the first j (j > k) 
is found such that replica itj is in a different family than 
replica k and also replica tt*, is in a different family than 
replica j. The permutation is now modified by transpos¬ 
ing TTj and TT],. This process is continued until there are 
no more incestuous pairs. Each pair then contributes one 
value to the histogram for P{q). Notice that in each step 
of the procedure the number of incestuous pairs decreases 
by one. So long as the maximum family size is less than 
Rp/2, which is required anyway for a well-equilibrated 
run, this procedure will find an unbiased, nonincestuous 
pairing. Although the worst-case complexity of the pro¬ 
cedure is 0{R?), in practice the complexity is 0{K). 

Weighted averaging may also be used to combine re- 
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suits for Pj{q) from many runs with Pj{q) playing the 
role of the observable A in Eq. ([^. The justification 
for weighted averaging based on unnormalized population 
annealing holds, although the argument also requires the 
fact that each family in unnormalized PA is independent 
and identically distributed. 


VI. RESULTS 

In this section, we present results for both PA and PT. 
This section serves two purposes. The first purpose is to 
validate population annealing and verify claims made in 
Sec. |IV| about its statistical and systematic errors. The 
second purpose is to compare the efficiency of PA and 
PT. 


A. Spin overlap 

Figure [T] shows a scatter plots for sizes L = 4, 6, 8, and 
10 of for both algorithms, with the vertical position 
of each point the value of Ij for PT and the horizon¬ 
tal position the value for PA. Disorder realizations with 
lj=0 for either algorithm are not shown. This figure 
demonstrates reasonable agreement between the two al¬ 
gorithms for each disorder realization. Note that PT is 
capable of measuring smaller values oi Ij than PA be¬ 
cause the number of measurements 2^ for PT is larger 
than the number of measurements R for PA. 

Next, we consider I = [Ij]d, the disorder average of 
the integral of the spin overlap in the range from —0.2 < 
q < 0.2. Table [In| gives results for both PA and PT for I. 
The quoted errors are obtained from the sample variance 
divided by the square root of the sample size ^/n so it is 
not surprising that the difference between the PA and PT 
results is much less than the error since both algorithms 
use the same set of disorder realizations. It is comforting 
that the results are so close. Because both algorithms are 
quite different and use different criteria for equilibration 
it suggests that systematic errors are minimal and cannot 
be detected in disorder averages with a sample size of 
5000. 


TABLE III: Comparison of the disorder averaged overlap 
weight near the origin, I between PA and PT at T = 0.2 for 
the same set of disorder realizations. 


L 

4 

6 

8 

10 

PA 

0.0186(10) 

0.0194(10) 

0.0205(10) 

0.0200(10) 

PT 

0.0185(9) 

0.0196(9) 

0.0205(10) 

0.0198(10) 



10 -® 10-3 10 -® 10-3 1 
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FIG. 1: (Color online) Log-log (base 10) scatter plots of Ij. 
Each point represents a disorder realization. The horizontal 
position of the point is Ij measured in PA and the vertical 
position is the value of Ij measured in PT, for sizes, L = 4, 
6, 8 and 10 at T = 0.2. 


B. Characteristic population sizes in PA and 
correlation times in PT 

Next we consider quantities that are intrinsic to each 
algorithm and that are related to errors. Figurej^is a log¬ 
arithmic scatter plot of Ps, the entropic family size mea¬ 
sured in PA, and the integrated autocorrelation time 
for the spin overlap measured in PT. Each point repre¬ 
sents a disorder realization; the horizontal position of the 
point is logj^Q Pe and the vertical position is logj^Q Tj^^. . It is 
striking that the these two quantities are strongly corre¬ 
lated. Both ps and are related to statistical errors in 
their respective algorithms and large values correspond 
to hard instances that require lots of computer resources 
to simulate accurately. It is clear that the hardness of an 
instance for PA and for PT is strongly correlated. 

Figure P shows histograms of log^pPs (left panel) and 
logio (right panel) for all 4945 disorder realizations of 
size L = 10 at r = 0.2. Both distributions are very broad 
and both are skewed toward hard disorder realizations 
although the ps distribution is more sharply peaked than 
the distribution. 

Figure is a log-linear plot of the disorder averages 
[logiQ ps]d and [logio '^intld vs system size L. Square sym¬ 
bols (blue) are for PT at T = 0.2, circular symbols (red) 
for PA at T = 0.2, and triangular symbols (green) for PA 
at T = 0.42. The nearly linear behavior suggests that 
both algorithms suffer exponential slowing with system 
size as expected. The fitted slope is greater for PT than 
for PA, however, one should be cautious in interpreting 
these fits as indicating better scaling for PA relative to 
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FIG. 2: (Color online) Log-log scatter plot of ps (entropic 
family size for PA) vs (integrated autocorrelation time 
of the spin overlap for PT). Each point represents a single 
disorder realization and there are roughly 5000 disorder real¬ 
izations each for sizes L = 6, 8, and 10 at T = 0.2. 
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FIG. 3: (Color online) Histogram of logj^Q ps (left panel) and 
logjo (right panel) for all 4945 disorder realizations, size 
L = 10 at T = 0.2. 


PT. There is some upward curvature for PA in the data 
for both temperatures so the asymptotic scaling slope 
may be significantly larger than the finite-L slope. In 
addition, Tint and ps are not strictly comparable quan¬ 
tities and, finally, neither algorithm has been carefully 
optimized. Nonetheless, one can safely conclude that PA 
is at least comparable in efficiency to PT for the sizes 
studied - system sizes that are of current scientific inter¬ 
est across applications. 


In Secs. |IV A| and |IVB[ we introduced three charac¬ 
teristic population sizes, ps, pt, and pf. Both ps [see 
Eq. @] and Pt [see Eq. ( [2^ ] are obtained from the 
distribution of family sizes and are related to statistical 
errors while pf [see Eq. (20l] is obtained from the vari¬ 
ance of the free-energy estimator and controls systematic 



FIG. 4: (Golor online) Disorder averages [logj^Q Ps[d for PA 
and [logioSquare symbols (blue) are for PT at 
T = 0.2, circular symbols (red) for PA at T = 0.2, and trian¬ 
gular symbols (green) for PA at T — 0.42. Straight lines are 
best linear fits to the data. Note that the behavior of pt is 
qualitatively similar to the behavior of ps- 


errors. What is the relation between these three quanti¬ 
ties for spin glasses? Figure is a scatter plot of ps vs pt 
for system sizes L = 4, 6, 8, and 10. Each point repre¬ 
sents a single disorder realization. It is clear that these 
two measures are strongly correlated with ps serving as 
a lower bound for pt- 



logio (Ps) 

FIG. 5: (Color online) Scatter plot of ps, entropic family size 
vs Pt the mean squared family size for sizes L = 4, 6, 8, and 
10 at T = 0.2. 


Figure]^ is a scatter plot of ps vs pf for the n = 1000 
disorder realizations of size L = 14 where each point 
represents a disorder realization. The value of p/ is es¬ 
timated for each disorder realization from 10 runs with 
R = 3 X 10® and p/ is estimated as R times the sample 
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variance of PF from the 10 runs. Since it is obtained 
from only 10 runs, pf has large statistical errors. The 
straight line is a best fit through the data points. It is 
clear that ps and pf are strongly correlated although pf 
is on average a factor of 1.6 larger than ps- 



logio(Ps) 

FIG. 6: (Color online) Scatter plot of the entropic family 
size Ps vs equilibration population size pf for 1000 disorder 
realizations of size L = 14 at T = 0.42. The straight line is a 
best ht through the data. 


The strong correlation between ps and pf justifies us¬ 
ing R > lOOps as an equilibration criterion. In principle, 
equilibration (systematic error) is controlled by p/ but 
measuring pj requires multiple runs whereas measuring 
Ps requires only a single run. Thus, except for situations 
where weighted averaging is used, it is more straightfor¬ 
ward and reasonably well justified to require that the 
population size is some factor larger than ps- Because 
Pt is just as easy to measure as Ps, and pt > Ps, and pt 
is more directly related to statistical errors, it may be 
preferable to use pt rather than ps as an equilibration 
criterion in future simulations. 


C. Convergence to equilibrium 

Since statistical errors are much larger than system¬ 
atic errors, in order to investigate systematic errors, 
le.,convergence to equilibrium, it is necessary to carry 
out a very large number of independent simulations of 
the same disorder realization. From these many runs, 
systematic errors can be studied as a function of popu¬ 
lation size R. In this section we examine in detail the 
convergence to equilibrium for two disorder realizations. 
One of these disorder realizations is the hardest L = 8 
sample as measured by ps- This disorder instance was 
also studied in detail in Refs. [12] and [50]. We call 
this disorder realization “instance J8.” The second is 


TABLE IV: Equilibrium values of observables at T = 0.2 for 
the two disorder instances studied in detail, J4 and J8, of 
sizes L = 4 and 8, respectively. 



Pf 

-PF 

-E 

&E/PSF 

I 

Sl/PSF 

J4 

33 

584.138 

116.541 

0.0542 

0.0929 

0.0843 

J8 

9.0 X 10® 

4457.53 

890.186 

0.0355 

0.00104 

0.00105 


an L = 4 disorder realization that we call “instance J4.” 
Observables of the two instances are shown in Table IIVI 

We first carefully examine, for instance J8, the con¬ 
vergence to equilibrium as a function of R for the energy 
estimator E and the dimensionless free-energy estimator 
pF at temperature Tg = 0.2. Figure]^ shows histograms 
of the deviation of the dimensionless free-energy estima¬ 
tor from its equilibrium value, A/3F (top row), the devi¬ 
ation of the energy estimator from its equilibrium value, 
AE (middle row), and a scatter plot of their joint distri¬ 
bution (bottom row) for population sizes, R = 10^, 10^, 
10® and 10® (from left to right, respectively). For each 
population size we carried out A4 = 1000 independent 
simulations of J8. The ‘exact’ equilibrium values, listed 
in Table IV are obtained from a weighted average of the 
1000 runs at the largest population size, R = 10®. For 
the two smaller populations the distributions are highly 
non-Gaussian but as R increases the joint distribution 
approaches a bivariate Gaussian distribution. The joint 
distribution initially consists of two well-separated peaks 
representing the fact for small R most or all of the popu¬ 
lation is frequently stuck in a metastable state with both 
a higher free energy and higher energy. This bimodal 
distribution is a feature of this particular disorder real¬ 
ization and explains, in part, the computational hardness 
of this sample. Since pf « 10^, the R = 10® populations 
are not equilibrated and the R = lO'^ populations are 
barely equilibrated. Finally, for R = 10®, the popula¬ 
tions are reasonably well equilibrated so that the E and 
/3F distributions are close to Gaussian and the joint dis¬ 
tribution is close to a bivariate Gaussian. The slope of 
the regression line through the scatter plot representing 
the R = 10® joint distribution is an estimator of the 
quantity SE/I36E, which controls the error in the energy 
estimator, see Eq. (22). 

We can assess more quantitatively whether /3F and E 
are described by a bivariate normal distribution. From 
the Ai = 1000 runs, we measured the skewness and kur- 
tosis of both variables. For instance, for J8 and R — 10®, 
the skewness and (excess) kurtosis of the dimensionless 
free-energy runs is 0.047 and 0.043, respectively. Both 
values are statistically indistinguishable from values that 
would be obtained from a sample of 1000 normal ran¬ 
dom variates. The corresponding values of skewness and 
kurtosis for the energy are 0.121 and 0.152, respectively. 
Although larger, both values are consistent with a sam¬ 
ple of 1000 normal random variates. The joint distribu¬ 
tion is, however, only marginally consistent with a bi¬ 
variate Gaussian, as measured by the Mardia [13] com- 
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bined skewness and kurtosis test (p = 0.06). For in¬ 
stance, for J8 at i? = 10®, R/pf ~ 10^. For J4 at 
population size R = 10® we have R/pf « 3 x 10"^ and 
from M = 5000 runs the joint distribution cannot be 
distinguished from a bivariate Gaussian by the Mardia 
combined test {p = 0.4). 


R = 10^ R= uh 


R= lO'^ 
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FIG. 7: (Color online) Histograms of A/3F (top row), his¬ 
tograms of AE (middle row), and scatter plots representing 
the joint distribntions of AE and ApF for instance J8 at 
T = 0.2 (bottom row). Each column represents a given pop- 
nlation size and, from left to right, R — 10^, lO'^, 10®, and 
10®, respectively. The slope of the regression line in the AE 
vs AfiF scatter plot for R = 10® (lower right box) is the 
estimator of 5E/fiSF. 


Figure shows similar results for the size 8 instance 
J8. Since the joint distributions are far from bivariate 
Gaussians for the smaller values of R for instance J8, 
the theoretical predictions for (Aif), and (AJ) are poor 
for the smaller population sizes. The points for R — 10® 
in Fig. are in essentially perfect agreement with the 
theoretical predictions of Eq. (221, however, since pf and 
SA/I36F are all measured at i? = 10® this agreement is 
really just a check that the joint distribution is close to 
the assumed bi-variate Gaussian form. 



FIG. 8: (Color online) Log-log plot showing the deviations 
from equilibrium (systematic errors) in the dimensionless free 
energy, {Aj3F) (red circles), energy, (AE) (bine squares) and 
overlap near the origin (AI) (green triangles) as a function of 
population size R for instance J4 at T = 0.2. The straight 
lines are theoretical curves based on Eqs. \22\ and (|23[). 


Next, we study the convergence of the mean values 
of observables to their equilibrium values as a function 
of R. For each observable A we obtain the mean value 
for a single run (^) from a simple average over all M. 
runs for each population size and we obtain the equilib¬ 
rium value from a weighted average over all runs at the 
largest size. Figure [^shows {A/3F), (AE), and (A/), 
the deviation of the estimators of the dimensionless free 
energy, energy and overlap near the origin from their re¬ 
spective equilibrium values, as a function of population 
size R for instance J4. The straight lines are theoretical 
curves from Eqs. (22) and (23) using the values o f pf and 
SA/PSF estimated at i? = 10® and given in Tab. We 
see that there is reasonable quantitative agreement with 
the predicted 1/R dependence of the systematic errors. 
The R — 10® data point is not shown because statistical 
errors in measuring the exact values (x4) are compara¬ 
ble here to systematic errors in (^). Probing the 1/R 
regime of systematic errors proved quite difficult because 
of the much larger statistical errors. For example, to suf¬ 
ficiently reduce statistical errors for instance J4 we used 
A4 = 32 000 independent runs to obtain the R = 10® av¬ 
erages (^) in Fig. and M = 5000 independent runs at 
Rq = 10® to obtain “exact” equilibrium values (^) from 
weighted averaging. 


Next, we examine the convergence of the various char¬ 
acteristic population sizes to their asymptotic values. 


Figure 10 shows the finite size estimators of p/, ps, and pt 
versus tEe population size R at which they are measured 
for instance J8. For this instance all of these quanti¬ 
ties have values near 10^ and their values are near their 
asymptotic values for the two largest population sizes for 
which R > lOp. The rapid convergence of p/ supports the 
hypothesis that equilibrium is approached as 1 /R. We do 
not show a similar graph for instance J4 because all three 
p measures are already saturated to their asymptotic val¬ 
ues within statistical errors even for smallest population 
sizes studied. 

Finally, we can also gain some insights into weighted 
averaging from the detailed study of a single instance. 
The question we address is whether a single run is sig¬ 
nificantly better than a weighted average with the same 
total population size. We computed the systematic error 
in the weighted average of the dimensionless free energy 
/3F of instance J8 for Rq = 10® and M = 10 and com¬ 
pared it to the systematic error for a single run with 
R = MRq = 10^. We used M — 1000 independent runs 
with R = 10® to compute the mean {f3F) and standard 
error of the weighted average. To compute the mean, we 
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FIG. 9: (Color online) Log-log plot showing the deviations 
from equilibrium (systematic errors) in the dimensionless free 
energy {A/3F) (red circles), energy, (AE) (blue squares), and 
overlap near the origin (A/) (green triangles) as a function of 
population size R for instance J8 at T = 0.2. The straight 
lines are theoretical curves based on Eqs. (221 and (23l. 


dent runs, we did not achieve sufficient statistical power 
to distinguish the weighted average clearly from a single 
long run although we expect the former to have some¬ 
what larger systematic errors. These considerations lead 
to the following conjecture: Suppose one has available a 
fixed total amount of work defined by a total population 
size, RqM such that Rq > pf, then the weighted aver¬ 
age obtained from M runs each with population size Rq 
is statistically indistinguishable from a single long run 
with population R = RqM. However, the discussion in 
Sec. |IV C| comparing PA and PT for disorder averaging 
is relevant here as well. If a sufficiently large disorder 
sample is simulated, the differences in systematic errors 
between a weighted average and a single long run could 
become relevant. While additional work to understand 
the systematic errors in weighted averaging is needed, it 
seems clear that weighted averaging is a useful tool for 
studying hard problems requiring very large total popu¬ 
lations. 


VII. DISCUSSION 



FIG. 10: (Color online) Log-log plot showing estimators of the 
equilibration sizes pf (red circles), pt (blue squares), and ps 
(green triangles) as a function of population size R a.tT — 0.2 
for instance J8. 


take M = 10 random values from the set of = 1000 
runs, compute the weighted average, and then take the 
mean of that weighted average over many such experi¬ 
ments. We used the blocking method to compute the 
standard error of the mean. We used M = 1000 runs 
with R = lO'^ to obtain (/3F) and its error. We found 
that (A/3F) = 0.75 ± 0.12, while (A/3F) = 0.63 ± 0.04. 
Thus, the weighted average has roughly the same sys¬ 
tematic errors as the single long run. Note that in this 
example, Rq < pf. We expect the differences between 
weighted averaging and a single long run to vanish as 
Ro/Pf oo. Unfortunately, even with 1000 indepen- 


We have shown that population annealing is an effec¬ 
tive and efficient algorithm for simulating spin glasses. 
It is comparably efficient to parallel tempering, the stan¬ 
dard in the field, and it has several advantages. 

The first advantage is that it is naturally a massively 
parallel algorithm. The convergence to equilibrium oc¬ 
curs as the population size grows and each replica in the 
population can be simulated independently. Since re¬ 
alistic spin-glass simulations using population annealing 
require population sizes of the order of 10® or more, there 
is a much greater scope for parallelism than in parallel 
tempering where in spin-glass simulations typically less 
than 100 replicas are simulated in parallel. To put this 
difference in perspective, recall that parallel tempering is 
a Markov-chain Monte Carlo algorithm while population 
annealing is a sequential Monte Carlo algorithm. From a 
computational complexity perspective, when going from 
a Markov-chain Monte Carlo algorithm to a sequential 
Monte Carlo algorithm, time is exchanged for hardware 
so that long running times can be exchanged for mas¬ 
sive parallelism. The downside of exchanging time for 
hardware is that population annealing has much larger 
memory requirements than parallel tempering. 

A second advantage of population annealing is access 
to weighted averaging, which allows multiple independent 
runs of PA to be combined to improve both statistical and 
systematic errors. Weighted averaging opens the door 
to distributed computing. It is potentially possible to 
quickly simulate very difficult to equilibrate instances of 
spin glasses or other hard statistical-mechanical models 
by distributing the work over a large and inhomogeneous 
set of computational resources. The only information 
that needs to be collected and analyzed centrally from 
each run is the estimators of observables together with 
the estimator of the free energy. 
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Apart from its large memory usage, the main disad¬ 
vantage of population annealing (and sequential Monte 
Carlo methods in general) is that it converges to equilib¬ 
rium inversely in population size whereas parallel temper¬ 
ing (and Markov-chain Monte Carlo methods in general) 
converges exponentially. In most situations, this differ¬ 
ence is moot because statistical errors are much larger 
than systematic errors. However, for very high precision 
disorder averages, it is possible that the exponential con¬ 
vergence of parallel tempering would be an advantage 
over the power law convergence of population annealing. 

Thus far, the implementations of population anneal¬ 
ing for large-scale simulations have used a simple anneal¬ 
ing schedule. The temperature set is uniform in the in¬ 
verse temperature and there are a constant number of 
Metropolis sweeps at each temperature. It is plausible 
that a more complicated annealing schedule might be 
more efficient. It is perhaps possible that the anneal¬ 
ing schedule can be adaptively adjusted to the particular 
problem instance in analogy to related proposals for par¬ 
allel tempering simulations [Hill]. It might also improve 
efficiency to change the population size with tempera¬ 
ture. In addition, our implementation uses the Metropo¬ 
lis algorithm at every temperature, however, at low tem¬ 
peratures kinetic Monte Carlo might be preferable and, 
at intermediate temperatures cluster moves, might be 
useful [IS]. 

Population annealing is a general method suitable for 
simulating equilibrium states of systems with rough free- 
energy landscapes. It can be applied to any system for 
which there is a parameter, such as temperature, that 
takes the equilibrium distribution from an easy to simu¬ 
late region, e.g., at high temperature, to a hard to simu¬ 


late region, e.^., at low temperature. In addition to spin 
systems, population annealing may prove useful in sim¬ 
ulating the equilibrium states of dense fluids or complex 
biomolecules. 
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